AITopics | multimodal data fusion

Collaborating Authors

multimodal data fusion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DF-DM: A foundational process model for multimodal data fusion in the artificial intelligence era

Restrepo, David, Wu, Chenwei, Vásquez-Venegas, Constanza, Nakayama, Luis Filipe, Celi, Leo Anthony, López, Diego M

arXiv.org Artificial IntelligenceJun-2-2024

In the big data era, integrating diverse data modalities poses significant challenges, particularly in complex fields like healthcare. This paper introduces a new process model for multimodal Data Fusion for Data Mining, integrating embeddings and the Cross-Industry Standard Process for Data Mining with the existing Data Fusion Information Group model. Our model aims to decrease computational costs, complexity, and bias while improving efficiency and reliability. We also propose "disentangled dense fusion", a novel embedding fusion method designed to optimize mutual information and facilitate dense inter-modality feature interaction, thereby minimizing redundant information. We demonstrate the model's efficacy through three use cases: predicting diabetic retinopathy using retinal images and patient metadata, domestic violence prediction employing satellite imagery, internet, and census data, and identifying clinical and demographic features from radiography images and clinical notes. The model achieved a Macro F1 score of 0.92 in diabetic retinopathy prediction, an R-squared of 0.854 and sMAPE of 24.868 in domestic violence prediction, and a macro AUC of 0.92 and 0.99 for disease prediction and sex classification, respectively, in radiological analysis. These results underscore the Data Fusion for Data Mining model's potential to significantly impact multimodal data processing, promoting its adoption in diverse, resource-constrained settings.

foundation model, fusion, modality, (15 more...)

arXiv.org Artificial Intelligence

2404.12278

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
(18 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.94)
Research Report > Promising Solution (0.67)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.69)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications

Restrepo, David, Wu, Chenwei, Cajas, Sebastián Andrés, Nakayama, Luis Filipe, Celi, Leo Anthony, López, Diego M

arXiv.org Artificial IntelligenceJun-1-2024

Large-scale multi-modal deep learning models have revolutionized domains such as healthcare, highlighting the importance of computational power. However, in resource-constrained regions like Low and Middle-Income Countries (LMICs), limited access to GPUs and data poses significant challenges, often leaving CPUs as the sole resource. To address this, we advocate for leveraging vector embeddings to enable flexible and efficient computational methodologies, democratizing multimodal deep learning across diverse contexts. Our paper investigates the efficiency and effectiveness of using vector embeddings from single-modal foundation models and multi-modal Vision-Language Models (VLMs) for multimodal deep learning in low-resource environments, particularly in healthcare. Additionally, we propose a simple yet effective inference-time method to enhance performance by aligning image-text embeddings. Comparing these approaches with traditional methods, we assess their impact on computational efficiency and model performance using metrics like accuracy, F1-score, inference time, training time, and memory usage across three medical modalities: BRSET (ophthalmology), HAM10000 (dermatology), and SatelliteBench (public health). Our findings show that embeddings reduce computational demands without compromising model performance. Furthermore, our alignment method improves performance in medical tasks. This research promotes sustainable AI practices by optimizing resources in constrained environments, highlighting the potential of embedding-based approaches for efficient multimodal learning. Vector embeddings democratize multimodal deep learning in LMICs, particularly in healthcare, enhancing AI adaptability in varied use cases.

dataset, fusion, learning, (15 more...)

arXiv.org Artificial Intelligence

2406.02601

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Therapeutic Area > Dermatology (0.66)
Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction

Ding, Kexin, Zhou, Mu, Metaxas, Dimitris N., Zhang, Shaoting

arXiv.org Artificial IntelligenceJul-21-2023

Survival outcome assessment is challenging and inherently associated with multiple clinical factors (e.g., imaging and genomics biomarkers) in cancer. Enabling multimodal analytics promises to reveal novel predictive patterns of patient outcomes. In this study, we propose a multimodal transformer (PathOmics) integrating pathology and genomics insights into colon-related cancer survival prediction. We emphasize the unsupervised pretraining to capture the intrinsic interaction between tissue microenvironments in gigapixel whole slide images (WSIs) and a wide range of genomics data (e.g., mRNA-sequence, copy number variant, and methylation). After the multimodal knowledge aggregation in pretraining, our task-specific model finetuning could expand the scope of data utility applicable to both multi- and single-modal data (e.g., image- or genomics-only). We evaluate our approach on both TCGA colon and rectum cancer cohorts, showing that the proposed approach is competitive and outperforms state-of-the-art studies. Finally, our approach is desirable to utilize the limited number of finetuned samples towards data-efficient analytics for survival outcome prediction. The code is available at https://github.com/Cassie07/PathOmics.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2307.11952

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)
North America > United States > New Jersey (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.69)
Health & Medicine > Therapeutic Area > Gastroenterology (0.68)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Explaining Multimodal Data Fusion: Occlusion Analysis for Wilderness Mapping

Ekim, Burak, Schmitt, Michael

arXiv.org Artificial IntelligenceApr-5-2023

Jointly harnessing complementary features of multi-modal input data in a common latent space has been found to be beneficial long ago. However, the influence of each modality on the models decision remains a puzzle. This study proposes a deep learning framework for the modality-level interpretation of multimodal earth observation data in an end-to-end fashion. While leveraging an explainable machine learning method, namely Occlusion Sensitivity, the proposed framework investigates the influence of modalities under an early-fusion scenario in which the modalities are fused before the learning process. We show that the task of wilderness mapping largely benefits from auxiliary data such as land cover and night time light data.

artificial intelligence, machine learning, modality, (18 more...)

arXiv.org Artificial Intelligence

2304.02407

Country: Europe > Germany (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Multimodal Data Fusion based on the Global Workspace Theory

Bao, Cong, Fountas, Zafeirios, Olugbade, Temitayo, Bianchi-Berthouze, Nadia

arXiv.org Machine LearningJan-26-2020

We propose a novel neural network architecture, named the Global Workspace Network (GWN), that addresses the challenge of dynamic uncertainties in multimodal data fusion. The GWN is inspired by the well-established Global Workspace Theory from cognitive science. We implement it as a model of attention, between multiple modalities, that evolves through time. The GWN achieved F1 score of 0.92, averaged over two classes, for the discrimination between patient and healthy participants, based on the multimodal EmoPain dataset captured from people with chronic pain and healthy people performing different types of exercise movements in unconstrained settings. In this task, the GWN significantly outperformed a vanilla architecture. It additionally outperformed the vanilla model in further classification of three pain levels for a patient (average F1 score = 0.75) based on the EmoPain dataset. We further provide extensive analysis of the behaviour of GWN and its ability to deal with uncertainty in multimodal data.

architecture, modality, multimodal data fusion, (13 more...)

arXiv.org Machine Learning

2001.09485

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(7 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Vehicle Tracking Using Surveillance with Multimodal Data Fusion

Zhang, Yue, Song, Bin, Du, Xiaojiang, Guizani, Mohsen

arXiv.org Artificial IntelligenceOct-29-2018

Abstract--Vehicle location prediction or vehicle tracking is a significant topic within connected vehicles. This task, however, is difficult if only a single modal data is available, probably causing bias and impeding the accuracy. With the development of sensor networks in connected vehicles, multimodal data are becoming accessible. Therefore, we propose a framework for vehicle tracking with multimodal data fusion. Images, being processed in the module of vehicle detection, provide direct information about the features of vehicles, whereas velocity estimation can further evaluate the possible location of the target vehicles, which reduces the number of features being compared, and decreases the time consumption and computational cost. Vehicle detection is designed with a color-faster R-CNN, which takes both the shape and color of the vehicles into consideration. Meanwhile, velocity estimation is through the Kalman filter, which is a classical method for tracking. Finally, a multimodal data fusion method is applied to integrate these outcomes so that vehicle-tracking tasks can be achieved. Experimental results suggest the efficiency of our methods, which can track vehicles using a series of surveillance cameras in urban areas. ITH technological advancements in vehicles and transportation system, motorists require comfort and intelligent driving, not only mobility. Thus, there has been a great deal of research which mainly falls into one of two directions. On one hand, researchers tend to develop more intelligent vehicles, or devices that can be attached to vehicles, bringing up several popular topics such as autonomous vehicles or driverless vehicles [1].

artificial intelligence, information fusion, vehicle, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TITS.2017.2787101

1811.02627

Country:

Asia > China (0.69)
North America > United States > Idaho (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (0.93)
Transportation (0.88)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.88)

Add feedback